Goto

Collaborating Authors

 different order



Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features

Zhang, Junpeng, Li, Qing, Lin, Liang, Zhang, Quanshi

arXiv.org Artificial Intelligence

This paper investigates the dynamics of a deep neural network (DNN) learning interactions. Previous studies have discovered and mathematically proven that given each input sample, a well-trained DNN usually only encodes a small number of interactions (non-linear relationships) between input variables in the sample. A series of theorems have been derived to prove that we can consider the DNN's inference equivalent to using these interactions as primitive patterns for inference. In this paper, we discover the DNN learns interactions in two phases. The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders. We can consider the two-phase phenomenon as the starting point of a DNN learning over-fitted features. Such a phenomenon has been widely shared by DNNs with various architectures trained for different tasks. Therefore, the discovery of the two-phase dynamics provides a detailed mechanism for how a DNN gradually learns different inference patterns (interactions). In particular, we have also verified the claim that high-order interactions have weaker generalization power than low-order interactions. Thus, the discovered two-phase dynamics also explains how the generalization power of a DNN changes during the training process.


Learning nonparametric DAGs with incremental information via high-order HSIC

Wang, Yafei, Liu, Jianguo

arXiv.org Machine Learning

Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.


Implicit Wiener Series for Higher-Order Image Analysis

Neural Information Processing Systems

The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative ap- proach in which multiplicative pixel interactions are described by a se- ries of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images. Most of the interesting structure in a natural image is characterized by its higher-order statistics.


Uniform tensor clustering by jointly exploring sample affinities of various orders

Cai, Hongmin, Qi, Fei, Li, Junyu, Hu, Yu, Zhang, Yue, Cheung, Yiu-ming, Hu, Bin

arXiv.org Artificial Intelligence

Conventional clustering methods based on pairwise affinity usually suffer from the concentration effect while processing huge dimensional features yet low sample sizes data, resulting in inaccuracy to encode the sample proximity and suboptimal performance in clustering. To address this issue, we propose a unified tensor clustering method (UTC) that characterizes sample proximity using multiple samples' affinity, thereby supplementing rich spatial sample distributions to boost clustering. Specifically, we find that the triadic tensor affinity can be constructed via the Khari-Rao product of two affinity matrices. Furthermore, our early work shows that the fourth-order tensor affinity is defined by the Kronecker product. Therefore, we utilize arithmetical products, Khatri-Rao and Kronecker products, to mathematically integrate different orders of affinity into a unified tensor clustering framework. Thus, the UTC jointly learns a joint low-dimensional embedding to combine various orders. Finally, a numerical scheme is designed to solve the problem. Experiments on synthetic datasets and real-world datasets demonstrate that 1) the usage of high-order tensor affinity could provide a supplementary characterization of sample proximity to the popular affinity matrix; 2) the proposed method of UTC is affirmed to enhance clustering by exploiting different order affinities when processing high-dimensional data.


Growing Robot Minds – MetaDevo AI Blog

#artificialintelligence

One way to increase the intelligence of a robot might be to train it with a series of missions, analogous to the missions or levels in a video game. In a developmental robot, the training would not be simply learning--its "brain" structure would actually change. Biological development shows some extremes that a robot could go through, like starting with a small seed that constructs itself, or creating too many neural connections and then in a later phase deleting a whole bunch of them. As another example of development vs. learning, a simple artificial neural network is trained when the weights have been changed after a series of training inputs (and error correction if it is supervised). It would be like growing completely new nodes, network layers, or new networks entirely during each training level. Or you can imagine the difference between decorating a skyscraper (learning) and building a skyscraper (development).


Growing Robot Minds

#artificialintelligence

One way to increase the intelligence of a robot might be to train it with a series of missions, analogous to the missions or levels in a video game. In a developmental robot, the training would not be simply learning -- its "brain" structure would actually change. Biological development shows some extremes that a robot could go through, like starting with a small seed that constructs itself, or creating too many neural connections and then in a later phase deleting a whole bunch of them. As another example of development vs. learning, a simple artificial neural network is trained when the weights have been changed after a series of training inputs (and error correction if it is supervised). It would be like growing completely new nodes, network layers, or new networks entirely during each training level.


Germany could have WON the Battle of Britain if they started earlier, study finds

Daily Mail - Science & tech

A mathematical study claims to have proven the long-held belief that the Battle of Britain could have easily been won by the Germans if not for tactical ineptitude. University of York researchers have created a computer model that uses a statistical technique called'weighted bootstrapping' to re-imagine the 1940 battle under different circumstances. It identifies two enormous blunders by notorious Nazi commander Hermann Goering - a trained fighter pilot - who led the assault that crippled the Nazi effort and helped Britain win. The researchers say it provides statistical backing to many historians' belief that if Germany had launched an attack immediately after Winston Churchill's famous'Battle of Britain' speech on June 18, rather than three weeks later on July 10, and targeted airfields rather than cities and populated areas, the Nazis would probably have been victorious. This would have crippled the British response by decimating the number of fighter pilots and destroying vital radar systems used to track German planes, paving the way for a naval and land invasion.

  Country:
  Genre: Research Report > Experimental Study (0.35)
  Industry: Government > Military > Air Force (1.00)

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Sun, Ke, Lin, Zhouchen, Zhu, Zhanxing

arXiv.org Machine Learning

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(AdaBoosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors and integrate knowledge from different hops of neighbors into the network in an AdaBoost way. We also present the architectural difference between AdaGCN and existing graph convolutional methods to show the benefits of our proposal. Finally, extensive experiments demonstrate the state-of-the-art prediction performance and the computational advantage of our approach AdaGCN.


Knowledge Isomorphism between Neural Networks

Liang, Ruofan, Li, Tianlin, Li, Longfei, Zhang, Quanshi

arXiv.org Machine Learning

This paper aims to analyze knowledge isomorphism between pre-trained deep neural networks. We propose a generic definition for knowledge isomorphism between neural networks at different fuzziness levels, and design a task-agnostic and model-agnostic method to disentangle and quantify isomorphic features from intermediate layers of a neural network. As a generic tool, our method can be broadly used for different applications. In preliminary experiments, we have used knowledge isomorphism as a tool to diagnose feature representations of neural networks. Knowledge isomorphism provides new insights to explain the success of existing deep-learning techniques, such as knowledge distillation and network compression. More crucially, it has been shown that knowledge isomorphism can also be used to refine pre-trained networks and boost performance.